Towards autoscaling of Apache Flink jobs

نویسندگان

چکیده

Abstract Data stream processing has been gaining attention in the past decade. Apache Flink is an open-source distributed engine that able to process a large amount of data real time with low latency. Computations are among cluster nodes. Currently, provisioning appropriate cloud resources must be done manually ahead time. A dynamically varying workload may exceed capacity cluster, or leave underutilized. In our paper, we describe architecture enables automatic scaling jobs on Kubernetes based custom metrics, and simple policy. We also measure e ects state size target parallelism duration operation, which considered when designing autoscaling policy, so job respects Service Level Agreement.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate Stream Analytics in Apache Flink and Apache Spark Streaming

Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation effi...

متن کامل

Development of a News Recommender System based on Apache Flink

The amount of data on the web is constantly growing. The separation of relevant from less important information is a challenging task. Due to the huge amount of data available in the World Wide Web, the processing cannot be done manually. Software components are needed that learn the user preferences and support users in finding the relevant information. In this work we present our recommender ...

متن کامل

A Study of Execution Strategies for openCypher on Apache Flink

The concept of big data has become popular in recent years due to the growing demand of handling datasets of large sizes. A lot of new frameworks have been proposed to deal with the problem of processing, analysis and storage of big data. As one of them, Apache Flink is an open source platform allowing for distributed stream and batch data processing. Cypher, a declarative query language develo...

متن کامل

On the usability of Hadoop MapReduce, Apache Spark & Apache flink for data science

Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level, requiring many implementation steps even for simple analysis tasks. This has led to the development of advanced dataflow oriented platforms, most prominently...

متن کامل

State Management in Apache Flink®: Consistent Stateful Distributed Stream Processing

Stream processors are emerging in industry as an apparatus that drives analytical but also mission critical services handling the core of persistent application logic. Thus, apart from scalability and low-latency, a rising system need is first-class support for application state together with strong consistency guarantees, and adaptivity to cluster reconfigurations, software patches and partial...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Acta Universitatis Sapientiae: Informatica

سال: 2021

ISSN: ['1844-6086', '2066-7760']

DOI: https://doi.org/10.2478/ausi-2021-0003